Analysis of performance data from a relational database system for applications using stored procedures or sql

ABSTRACT

Analyzing performance data from a relational database to discover poorly performing SQL statements includes creating event records of performance data for query statements executed by a relational database management system, and condensing the performance data in the event records. A performance summary can be generated for each condensed event record. The performance summaries can then be analyzed based on performance characteristics to identify poorly performing SQL statements.

CROSS-REFERENCE TO RELATED APPLICATIONS

Under 35 USC §120, this application is a continuation application andclaims the benefit of priority to U.S. patent application Ser. No.11/080,171, filed Mar. 15, 2005, entitled “Analysis of Performance Datafrom a Relational Database System for Applications Using StoredProcedures or SQL,” which is incorporated herein by reference.

FIELD OF THE INVENTION

The invention relates to the field of database management andperformance analysis.

BACKGROUND OF THE INVENTION

Data records in a relational database management system (RDBMS) in acomputer are maintained in tables, which are a collection of rows havingthe same columns. Each column maintains information on a particular typeof data for the data records of the rows. Data from the database tablesmay be accessed using instructions in a language that the databasemanagement system recognizes. For example, Structured Query Language(SQL) statements are often used to query, access, and manipulate data ina database.

A SQL statement includes set operators that define operations to beperformed by the database management system when searching columns ofdatabase tables. For instance, a SQL set operator may look for allrecords having a field that satisfies a search condition, such as equalto or less than a certain value. Also, the operator can be a join, whichcombines or concatenates rows from different tables, then searches forthose concatenated rows that satisfy a search predicate. Rows from thetables involved in the join operation that do not satisfy the predicateor condition are excluded from the join result.

There are two methods of developing applications which interface with arelational database system. The first is to use static or dynamic SQLstatement calls from within the application program. A dynamic SQLstatement is constructed by the application program, “prepared” orcomplied with a call to the database management system, and thenexecuted. A static SQL statement is coded into the application program,and prepared or compiled prior to running the application program.

The second method of developing applications which interface with arelational database system identifies common functions that call one ormore SQL statements, and encapsulates each function as a StoredProcedure to be executed at the database server instead of theapplication, improving performance and in many cases simplifyingapplication development.

When an application program or Stored Procedure is compiled, a separatestep is performed to create an access plan for each SQL statement usingthe operators and search values in the statement to retrieve data in anefficient manner. For a static SQL statement, the access plan is basedon the state of the database at the time the plan is generated. Theaccess plan can be recreated using the current state of the database atany time by first updating database statistics and then “re-binding”.Most of the access plans generated by this process are optimal.

However, a few poorly performing statements can consume a significantamount of the database management system resources, resulting in aslowdown of the system. To find statements which use a large amount ofCPU or I/O time, an event monitor in the database management systemcreates an event record of each SQL statement or stored procedure thatis executed by the system, to record performance information for theexecuted statement.

When a SQL statement has a performance problem, such as a high executiontime or a high number of rows read, written or sorted, finding the SQLstatement with this poor execution performance is generally the firststep in correcting the problem. For example, the event records for theexecuted SQL statements are studied to find the statements having thesetypes of performance problems. However, the event monitor can includerecords for thousands or millions of executed statements, and eachrecord can contain many lines with performance information for a singleSQL statement. As a result, searching this amount of data to locate aproblem statement is very time consuming, and is therefore often notdone.

If the problem statement is executed infrequently, such as once atstart-up, then the overall impact on system performance may beinsignificant, and the problem statement can be left in place. However,if the problem statement is executed frequently, and accounts for asignificant reduction in system performance, then the problem statementneeds to be identified so that its execution performance, and theoverall performance of the system, can be improved.

Without a tool to summarize the performance records of the executedstatements, identifying the problem statement, and the statement'soverall impact on system performance, is very difficult. There istherefore a need for a summary of the performance of SQL statements sothat poorly performing statements can be identified.

Furthermore, the event monitor typically includes the text of dynamicSQL statements, but not the text of static SQL statements. For staticSQL statements, which are coded into the application program, theirpreparation or compilation is done prior to running the applicationprogram. Therefore, the event monitor contains only the package (ormodule) name and section number of the SQL statement, which indicatesthe relative position of the SQL statement within the module, ratherthan the actual text itself.

As a result, most users who are not familiar with the internal storageused by the database management system often attempt to locate the textof the static SQL statement within the source code of the applicationprogram using the package name and section number from the eventmonitor, but this can be time consuming and error prone. This may alsobe impossible, if the user does not have access to the applicationsource code. A user may also be able to either use a tool, such as a DB2control center, or have the ability to query a system table, to find thestatic SQL statement. However, if the event monitor is from anothersystem, then the user may have to ask a customer to find the text of thestatic statement.

There is therefore also a need to be able to identify the text of apoorly performing static SQL statement from the performance data so thatthe statement can be further analyzed and tuned for improvedperformance.

SUMMARY OF THE INVENTION

The compression of raw data from an event monitor into a compact form,such as one line per SQL statement, is performed to show relevantdetails for the statements, such as the text of static or dynamic SQLstatements. This allows a user to more easily see the complete flow ofoperations and performance of each statement, both within a singleapplication context and across the system.

In one embodiment, the compression technique produces a summary of theevent monitor data to provide statistics such as frequency of executionand cost of execution of each statement.

An overall summary can also be produced, to show information such as theelapsed time of the test, the number of stored procedures executed, thenumber of SQL statements executed, and the wait time, for example. Otherdata which a user deems as necessary for understanding applicationperformance may also be included in the overall summary.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a computer system that includes an analysis tool toperform an analysis of performance data.

FIG. 2 shows an example of a method of analyzing performance data from arelational database to discover poorly performing SQL statements.

FIG. 3 shows an example of an event record that is processed by themethod of analyzing performance data.

FIG. 4 shows an example of performance data for a transaction that isprocessed by the method of analyzing performance data.

FIG. 5 shows an example of a condensed report generated by the method ofanalyzing performance data.

FIG. 6 shows an example of condensed summaries for a set of executed SQLstatements.

FIG. 7 shows a statement identified as a result of sorting the SQLstatements of FIG. 6 based on total execution time.

FIG. 8 shows an example of a summary of performance data for statementsissued by an application program.

FIG. 9 shows an example of a computer processing system for analyzingperformance data.

DETAILED DESCRIPTION OF THE INVENTION

The following description is presented to enable one of ordinary skillin the art to make and use the invention, and is provided in the contextof a patent application and its requirements. Various modifications tothe preferred embodiment and the generic principles and featuresdescribed herein will be readily apparent to those skilled in the art.Thus, the present invention is not intended to be limited to theembodiments shown, but is to be accorded the widest scope consistentwith the principles and features described herein.

An analysis of performance data from a relational database can produceinformation from the raw performance data to aid in improving SQLstatements and stored procedures. A summary of performance data for eachexecuted statement can be generated by the analysis process. Also, asecond summary of performance data for a statement executed multipletimes can be generated from the first summaries. A third summary ofperformance data for executed statements associated with a givenapplication program can also be generated from the first summaries.

The analysis of performance data from a relational database discoverspoorly performing SQL-based applications by producing a summary ofexecution information for SQL statements and Stored Procedures. In oneembodiment, an event monitor report is processed to condense many linesof a performance record for an executed statement into a smaller numberof lines. For example, a summarization tool can analyze event monitordata by reading a formatted event monitor file, extracting static SQLstatements from a table, and creating a one line performance report foreach executed statement.

FIG. 1 illustrates a computer system that includes an analysis tool toperform an analysis of performance data. One or more applicationprograms 110 issue database operations in the form of Stored Procedurecalls or SQL statements 115 to compiler 120 of the relational databasemanagement system (RDBMS). A Stored Procedure is an encapsulation ofapplication logic and SQL statements which is executed directly undercontrol of the database system instead of being executed from a remoteclient application. Two types of SQL statements may be executed, staticor dynamic. A static SQL statement is fully defined as part of theapplication, and is compiled along with the application. A dynamic SQLstatement is constructed at run time.

For each database operation, compiler 120 receives the corresponding SQLstatement and produces an execution plan for the statement. Theexecution plan is sent to the execution engine 130, which executes theplan to retrieve data from database 140 and return the query result 150to the application program 110 that issued the query. The executionperformance of the plan is recorded by the event monitor 160.

This execution performance information is received by the analysis tool170, to produce a summary of the execution performance information forthe statement. Also, because each application program can issue hundredsor thousands of database queries, the analysis tool 170 can produce asummary of the execution performance information for the set of SQLstatements issued by that application program.

A method of analyzing performance data from a relational database todiscover poorly performing SQL statements is shown in FIG. 2. An eventmonitor creates event records containing performance data for SQLstatements that are executed by a database management system, 210. Inone embodiment, each event record contains information about anexecution of a SQL statement.

The data in the event records is condensed, 220. For example, certaininformation from the event records can be selected for preservation in asummary, and the non-selected information can be discarded. Certaininformation that is not available in the event record may also be addedto the summary during this procedure. For example, if the executed eventis a static SQL statement, then the text of the static SQL statement maybe extracted from a table and included with the other relevantperformance data. The form of presentation of the information obtainedfrom the event record can be changed, 230. For example, the arrangementof the performance data can be changed from multiple lines as displayedin the event records, into a collection of formatted columns, to improvereadability.

A summary of performance data for each executed SQL statement isgenerated, 240. In one embodiment, the summarization tool produces a oneline summary of information about the performance of each executedstatement by inserting the condensed performance data for each statementinto a row under the formatted columns. In addition to generating a oneline summary for each executed statement, the tool can produce a summaryof a set of executed statements, 250. For example, the tool cansummarize the performance of all SQL statements or stored procedurecalls issued by a given application.

The summaries for a given application can be sorted to rank thestatements based on one or more performance characteristics, such astheir elapsed execution time, 260. The summary information for the SQLstatements with a relatively high ranking can be examined to identifypoorly performing SQL statements, 270. These poorly performingstatements, and their corresponding application programs, can be tunedor changed to improve the overall performance of the database managementsystem.

An example of an event record that is created by the event monitor in210 of FIG. 2 is shown in FIG. 3. This event record shows an example ofthe performance related data associated with one executed statement.Certain performance related data of FIG. 3 can be useful in analyzingthe performance of the statement and in identifying a poorly performingstatement. For example, information such as the application ID field,the text of the statement, the start and stop time, and the executiontime can be used to identify a poorly performing statement.

The Application ID field is associated with one connection from anapplication. A single event monitor report might contain entries forhundreds or even thousands of concurrently executing applications, sothis field is important to identify the application program thatexecuted this statement. This field can also be used to identify the setof statements issued by a given application.

The event report shows that the executed statement in the example ofFIG. 3 is a dynamic SQL statement from package ICMPLSLM and section 3.Since it is a dynamic SQL statement, the text is included in the eventmonitor. If it had been a static SQL statement, the text would not havebeen included in the event monitor. The start and stop time shown by theevent record can be used to observer the relative order of execution,and to gain an understanding of the relationship of this statement toother statements executing at the same time.

The event record includes the execution time, which is performanceinformation that frequently gives significant insight into whether thestatement performed well. In this case, the statement required 2.7seconds to complete, which suggests that further analysis may beperformed to understand whether this represents a performance problem.For example, since this is a “close” statement, the elapsed time dependson the application which is calling the stored procedure. The length oftime could be due to the application not processing the data quickly, orsimply leaving the cursor open. Without seeing the other statementsexecuted by the transaction at the same time, it is difficult tospeculate about the possible reason the cursor was open for 2.7 seconds.

The event monitor report in the example of FIG. 3 shows that 2 rows werefetched, 3 rows were read, and no rows were inserted, updated ordeleted. Physical buffer pool reads represent disk I/O, and logicalbuffer pool reads represent access to pages already in memory. In thisexample, these values are relatively low, suggesting that there may nota problem with SQL performance for this statement.

In addition to producing an event report for a statement, the eventmonitor can produce a report for a transaction, as shown in FIG. 4. Adatabase application will typically execute multiple SQL statements aspart of a single transaction, with the end of the transaction beingcaused either implicitly or explicitly by the application. Except inrare cases, database cursors are closed when a transaction ends. Sincedatabase locks can be held during a transaction, it is important tounderstand the impact of long-running transactions on the overallperformance of the application. FIG. 4 shows the data captured in anevent monitor for a transaction.

The exemplary record of transaction performance data of FIG. 4 showsthat the transaction duration was over 23 seconds, there were 31 locksheld, and lock wait time was 15 milliseconds. Although lock wait time isthis example is short, a long-running transaction such as this can causeother transactions to lock, which could affect overall systemperformance. By capturing each transaction event with the applicationID, it is possible to discover whether other transactions executing atthe same time were blocked, and the overall cost/elapsed time of eachtransaction in an application.

As can be seen from the event record of FIG. 3, a large amount ofperformance information, often more than one page, is presented for eachexecuted SQL statement, even though only a subset of this informationmay be relevant in determining whether the statement performed poorly.To better analyze the relevant performance results, the relevantportions of the report are filtered from the event monitor report. Thismay be done by processing the raw event monitor file to identify andcondense reports for each Statement, Transaction, or Connection event.

For example, the process may express the relevant data condensed fromeach report as a single line as shown in FIG. 5. The condensed summaryfile produced in this example has one line of relevant performance datafor each event, and can be conveniently viewed by many text editors.(For illustration purposes only, a selection of a few lines of the fileis shown as three separate sections in FIG. 5, and the SQL statementsare truncated to improve readability.)

To condense the relevant performance measurements in a summary, theprocess can select data such as, for example, the application ID, stoptime, and elapsed time from the event record of FIG. 3. Then, theprocess can change the format of the condensed performance data from theevent records into a collection of formatted columns, where each columnidentifies a performance measurement preserved from the event record.The process can then create a summarization table by adding eachcondensed record as a row to the collection of formatted columns, suchthat each row in the table contains relevant performance data for anexecuted SQL statement. This process produces a performance summary ofeach individual statement, as shown in FIG. 5.

The summary of a statement's performance can include the SQL time, whichis the amount of time that elapsed during execution of the statement.The summary also includes the application time, which is the amount oftime between the end of execution of the previous statement and thebeginning of execution of the current statement. This allows theanalysis process to distinguish between time spent within SQL statementsand time spent between SQL statements, so that the performance of theapplication code itself can be understood without using a code profiler.A user is then able to identify areas of the code that can be improved.

In this example, FIG. 5 includes a column labeled “App Time” forApplication time, which is the time between SQL statements. This datacan be used to compare and contrast the time spent executing a SQLstatement with the time spent executing the application. This comparisoncan be valuable in determining whether poor performance is caused by thestatement or by the application. For example, if the application time ishigh relative to SQL time, then the focus of performance analysis wouldbe on the application logic rather than the statement.

The summary can also include the execution start time, the package ormodule name containing the SQL statement, and the section, which is usedto find the text for a static SQL statement. Other information about thestatement, such as the number of rows read, written and sorted, the type(static or dynamic), the operation (e.g., commit, execute, prepare,rollback), the return code, or the text of the SQL statement itself, canbe included in the summary. For example, during the creation of asummary for a static SQL statement, the text of the static SQL statementcan be retrieved from a statement table and added to the summary. In oneembodiment, the user can select any information reported by the eventmonitor to be included in the summary.

In addition to compressing performance information for each executed SQLstatement from the event monitor, the data compression program cancreate summaries for Stored Procedure records in the event monitor. AStored Procedure is a collection of SQL statements and application logicwhich is executed at the database server. The summary for a StoredProcedure can contain the difference between the stop time of a storedprocedure and the start time of the next for each Application ID. Thisperformance data can indicate time spent performing functions outside ofthe stored procedures, such as application time, network time, or theoverhead for the database system to begin execution of a storedprocedure following a call, for example.

The condensed summaries of in FIG. 3 show that the 7 event recordsextracted from the event monitor file are associated with 6 differentdatabase connections. By creating a single line in the condensed summaryfile for each event, a user can sort the summary file by Application ID(or connection) and the stop time of the statement to see theperformance activity for each application. By extracting lines for aselected application ID, a user can see the summary of each SQLstatement in the order in which it was executed, to provide insight intothe behavior of the application. Also, by extracting only the recordswhere the Type column contains TRX, it is possible to see the elapsedtime, statistics for rows and buffer pools, and the lock wait time foreach transaction.

Although this example of 7 records is relatively easy to view andunderstand, the event monitor captured for an actual test can includethousands or even millions of records. To more clearly understand theoverall behavior of an application, the compressed summaries shown inFIG. 5 can be further compressed into a second set of summaries as shownin FIG. 6.

A second compression process uses the summaries of FIG. 5 to generateinformation about a set of executed SQL statements as shown in FIG. 6.For example, when one basic statement is expressed as several dynamicSQL statements, each having the same package and section identifiers butdifferent text, the execution performance information for the severalstatements is captured by the second compression process in a summaryreport of the total performance of the several statements. Thiseliminates multiple entries for the same basic statement, so that theimpact of the statement on overall performance is clearer.

As shown in FIG. 6, the summary of performance data for a basicstatement executed multiple times includes three steps in executing onedynamic SQL statement—prepare, open, and close. This statement wasexecuted 10 times, with an average SQL time of 0.294 seconds. Althoughthe maximum number of rows read was 25, the average was 3 with 2 rowsfetched. The percent of total elapsed time for all 3 steps in thisstatement is only 0.41%. Even though one execution seemed to indicatethat there could be a performance concern, the overall impact of thestatement is insignificant. By using a text editor or other tool to sortthe file based on percent of SQL time, it is easy to find the statementscontributing the most to the elapsed time of the application.

The summary information of FIG. 6 can be analyzed to detect poorlyperforming SQL statements or stored procedures. For example, if thesummary file includes an average elapsed time of each SQL statement froman application, and the percentage of total execution time for each ofthe SQL statements, a user can sort the summary file based on thepercentage of total execution time to identify a SQL statement which hasthe greatest impact on system performance. FIG. 7 shows a statementidentified as a result of sorting the SQL statements of FIG. 6 based ontotal execution time.

The statement identified in FIG. 7 significantly contributes to theelapsed time of an application. This statement can be reviewed todetermine whether design changes will reduce the execution time. Theoverall system performance can then be improved by reviewing theidentified statements for coding changes, database tuning changes, orapplication design changes to eliminate them entirely.

This statement was executed 26 times with an average execution time of5.3 seconds. This statement alone accounted for 19% of the elapsed timeof the test, so would represent a significant opportunity to improveoverall application performance. In this example, the complete SQLstatement is included, so that this poorly performing statement can beanalyzed for improvement.

The large number of columns being inserted could certainly affectperformance. Additional analysis would show that some of these columnsare “character large objects” (CLOBs) which due to the nature of thedatabase management system are much slower to insert. By selecting onlythe records for package ICMOSDOE and section 17, we can see that oneexecution of this statement took 0.237 seconds, one took 10.5 seconds,one took 15.7 seconds, and the rest took approximately 2.6 seconds.Further review of the detail file shows that the transaction which wasexecuted for the statement that took 15.7 seconds had no lock wait time,so there is no database design issue. From this analysis, a user couldconclude that there is probably an I/O or (less likely due to the longtime) a CPU constraint causing the insert to be very slow, and would useoperating system tools to find whether this is the cause of poorperformance.

The summaries of FIG. 5 can also be compressed into a summary ofperformance data for the statements issued by each application program,as shown in FIG. 8. While the summaries for individual statements, asshown in FIG. 5 for example, can give a clear view of the behavior ofthe entire system or an individual application, and the cumulativesummary files of FIGS. 6 and 7 can help identify statements that shouldbe investigated to improve performance, further summarization can beperformed to characterize performance of an individual application, asshown in FIG. 8.

In this example, performance data was gathered from a test that was runfor 134 seconds, with 139 stored procedure and 2471 SQL statementsexecuted. (Since a database system can manage multiple concurrentapplications, the total elapsed time can be higher than the duration ofthe test in this example.) The time within stored procedures was 288seconds, and the time within SQL statements 723 seconds. (SQL elapsedtime is higher than stored procedure elapsed time because the Closestatement for a cursor returned to the application occurs after thestored procedure ends.) By including the number of SQL statementsexcluding Close and Prepare, and the elapsed time for these statements,it is possible to see that these accounted for approximately 50% of theSQL time. The lock wait time of 38 seconds is high for such a shorttest, and suggests that either database or application design issuesneed to be investigated.

FIG. 9 shows an example of a computer processing system 900 that canperform an analysis of performance data from a relational databasesystem. A computer program of executable instructions can be stored in acomputer-readable medium, such as system memory 906, static storagedevice 908, or disk drive 910, for example. For the purposes of thisspecification, the terms “machine-readable medium” and“computer-readable medium” shall be taken to include any medium that iscapable of storing or encoding a sequence of instructions for executionby the machine and that cause the machine to perform any one of themethodologies of the present invention. The terms “machine-readablemedium” and “computer-readable medium” shall accordingly be taken toinclude, but not be limited to, solid-state memories, optical andmagnetic disks, and a carrier wave that encodes a data signal.

A processor 904 can retrieve the instructions from memory through bus902, and execute the instructions. A user can also input instructions tothe processor through bus 902 using an input device 916, such as akeyboard, or a cursor control device 918, such as a mouse. Theprocessing system 900 can display information to the user using adisplay 914, such as a computer monitor. The processing system may sendand receive information to other electronic devices over a networkthrough communication interface 912, such as a modem for example, andcommunication link 920.

An analysis of performance data from a relational database system forapplications using stored procedures or SQL has been discussed. Theanalysis can track a number of rows fetched, read, written and sorted,and a number of buffer pool reads. The analysis can report thisinformation for the execution of the statement in a summary line toallow tools such as grep to extract subsets of information. The analysiscan also provide a summary that contains a number of stored proceduresexecuted with an elapsed time, a number of SQL statements with elapsedtime, SQL statements excluding close and total transaction time.

The present invention has been described in accordance with theembodiments shown. One of ordinary skill in the art will readilyrecognize that there could be variations to the embodiments, and thatany variations would be within the spirit and scope of the presentinvention. Accordingly, many modifications may be made by one ofordinary skill in the art without departing from the spirit and scope ofthe appended claims.

1. An system for analyzing performance data comprising: means forreceiving event records of performance data for query statementsexecuted by a relational database management system; and means forcondensing the performance data in the event records.
 2. The system ofclaim 1 further comprising means for generating, from each condensedevent record, a performance summary for the executed statement.
 3. Thesystem of claim 1 further comprising means for analyzing the performancesummaries for the executed statements based on one or more performancecharacteristics to identify one or more statements.
 4. The system ofclaim 1 further comprising means for linking each performance summary tothe text of the corresponding query statement executed by the relationaldatabase management system.
 5. The system of claim 1 further comprising:means for identifying a set of performance summaries for querystatements issued by the application program for each applicationprogram that issued query statements to the relational databasemanagement system; and means for generating a performance summary forthe application program from the corresponding identified set of querystatement performance summaries.
 6. The system of claim 1 furthercomprising means for analyzing the application program performancesummaries to identify one or more application programs.
 7. The system ofclaim 1 further comprising means for analyzing the performance summariesfor the query statements issued by the application program to identifyone or more query statements issued by the application program.
 8. Thesystem of claim 1 further comprising means for analyzing the applicationprograms and the query statements to improve their performance.
 9. Acomputer-readable storage medium storing program instructions, theprogram instructions which when executed by a computer system cause thecomputer system to execute a method comprising: receiving event recordsof performance data for query statements executed by a relationaldatabase management system; and condensing the performance data in theevent records.
 10. The computer-readable storage medium of claim 9further comprising program instructions for generating, from eachcondensed event record, a performance summary for the executedstatement.
 11. The computer-readable storage medium of claim 9 furthercomprising program instructions for analyzing the performance summariesfor the executed statements based on one or more performancecharacteristics to identify one or more statements.
 12. Thecomputer-readable storage medium of claim 9 further comprising programinstructions for linking each performance summary to the text of thecorresponding query statement executed by the relational databasemanagement system.
 13. The computer-readable storage medium of claim 9further comprising program instructions for: for each applicationprogram that issued query statements to the relational databasemanagement system, identifying a set of performance summaries for querystatements issued by the application program; and generating aperformance summary for the application program from the correspondingidentified set of query statement performance summaries.
 14. Thecomputer-readable storage medium of claim 9 further comprising programinstructions for analyzing the application program performance summariesto identify one or more application programs.
 15. The computer-readablestorage medium of claim 9 further comprising program instructions for,for each application program, analyzing the performance summaries forthe query statements issued by the application program to identify oneor more query statements issued by the application program.
 16. Thecomputer-readable storage medium of claim 9 further comprising programinstructions for analyzing the application programs and the querystatements to improve their performance.